145 research outputs found

    Resource-efficient fast prediction in healthcare data analytics: A pruned Random Forest regression approach

    Get PDF
    In predictive healthcare data analytics, high accuracy is both vital and paramount as low accuracy can lead to misdiagnosis, which is known to cause serious health consequences or death. Fast prediction is also considered an important desideratum particularly for machines and mobile devices with limited memory and processing power. For real-time health care analytics applications, particularly the ones that run on mobile devices, such traits (high accuracy and fast prediction) are highly desirable. In this paper, we propose to use an ensemble regression technique based on CLUB-DRF, which is a pruned Random Forest that possesses these features. The speed and accuracy of the method have been demonstrated by an experimental study on three medical data sets of three different diseases

    Adaptive One-Class Ensemble-based Anomaly Detection: An Application to Insider Threats

    Get PDF
    The malicious insider threat is getting increased concern by organisations, due to the continuously growing number of insider incidents. The absence of previously logged insider threats shapes the insider threat detection mechanism into a one-class anomaly detection approach. A common shortcoming in the existing data mining approaches to detect insider threats is the high number of False Positives (FP) (i.e. normal behaviour predicted as anomalous). To address this shortcoming, in this paper, we propose an anomaly detection framework with two components: one-class modelling component, and progressive update component. To allow the detection of anomalous instances that have a high resemblance with normal instances, the one-class modelling component applies class decomposition on normal class data to create k clusters, then trains an ensemble of k base anomaly detection algorithms (One-class Support Vector Machine or Isolation Forest), having the data in each cluster used to construct one of the k base models. The progressive update component updates each of the k models with sequentially acquired FP chunks; segments of a predetermined capacity of FPs. It includes an oversampling method to generate artificial samples for FPs per chunk, then retrains each model and adapts the decision boundary, with the aim to reduce the number of future FPs. A variety of experiments is carried out, on synthetic data sets generated at Carnegie Mellon University, to test the effectiveness of the proposed framework and its components. The results show that the proposed framework reports the highest F1 measure and less number of FPs compared to the base algorithms, as well as it attains to detect all the insider threats in the data sets

    DeTraC: Transfer Learning of Class Decomposed Medical Images in Convolutional Neural Networks

    Get PDF
    Due to the high availability of large-scale annotated image datasets, paramount progress has been made in deep convolutional neural networks (CNNs) for image classification tasks. CNNs enable learning highly representative and hierarchical local image features directly from data. However, the availability of annotated data, especially in the medical imaging domain, remains the biggest challenge in the field. Transfer learning can provide a promising and effective solution by transferring knowledge from generic image recognition tasks to the medical image classification. However, due to irregularities in the dataset distribution, transfer learning usually fails to provide a robust solution. Class decomposition facilitates easier to learn class boundaries of a dataset, and consequently can deal with any irregularities in the data distribution. Motivated by this challenging problem, the paper presents Decompose, Transfer, and Compose (DeTraC) approach, a novel CNN architecture based on class decomposition to improve the performance of medical image classification using transfer learning and class decomposition approach. DeTraC enables learning at the subclass level that can be more separable with a prospect to faster convergence.We validated our proposed approach with three different cohorts of chest X-ray images, histological images of human colorectal cancer, and digital mammograms. We compared DeTraC with the state-of-the-art CNN models to demonstrate its high performance in terms of accuracy, sensitivity, and specificity

    A Non-intrusive Heuristic for Energy Messaging Intervention Modelled using a Novel Agent-based Approach

    Get PDF
    In response to the increased energy consumption in residential buildings, various efforts have been devoted to increase occupant awareness using energy feedback systems. However, it was shown that feedback provided by these systems is not enough to inform occupant actions to reduce energy consumption. Another approach is to control energy consumption using automated energy management systems. The automatic control of appliances takes-out the occupant sense of control, which is proved to be uncomfortable in many cases. This paper proposes an energy messaging intervention that keeps the control for occupants whilst supporting them with actionable messages. The messages inform occupants about energy waste incidents happening in their house in real-time, which enables occupants to take actions to reduce their consumption. Besides, a heuristic is defined to make the intervention non-intrusive by controlling the rate and time of the messages sent to occupants. The proposed intervention is evaluated in a novel layered agentbased model. The first layer of the model generates detailed energy consumption and realistic occupant activities. The second layer is designed to simulate the peer pressure effect on the energy consumption behaviour of the individuals. The third layer is a customisable layer that simulates energy interventions. The implemented intervention in this paper is the proposed non-intrusive messaging intervention. A number of scenarios are presented in the experiments to show how the model can be used to evaluate the proposed intervention and achieve energy efficiency targets

    Adaptive Mining Techniques for Data Streams Using Algorithm Output Granularity Mohamed

    Get PDF
    Mining data streams is an emerging area of research given the potentially large number of business and scientific applications. A significant challenge in analyzing /mining data streams is the high data rate of the stream. In this paper, we propose a novel approach to cope with the high data rate of incoming data streams. We termed our approach "algorithm output granularity". It is a resource-aware approach that is adaptable to available memory, time constraints, and data stream rate. The approach is generic and applicable to clustering, classification and counting frequent items mining techniques. We have developed a data stream clustering algorithm based on the algorithm output granularity approach. We present this algorithm and discuss its implementation and empirical evaluation. The experiments show acceptable accuracy accompanied with run-time efficiency. They show that the proposed algorithm outperforms the K-means in terms of running time while preserving the accuracy that our algorithm can achieve

    Ensemble Dynamics in Non-stationary Data Stream Classification

    Get PDF
    Data stream classification is the process of learning supervised models from continuous labelled examples in the form of an infinite stream that, in most cases, can be read only once by the data mining algorithm. One of the most challenging problems in this process is how to learn such models in non-stationary environments, where the data/class distribution evolves over time. This phenomenon is called concept drift. Ensemble learning techniques have been proven effective adapting to concept drifts. Ensemble learning is the process of learning a number of classifiers, and combining them to predict incoming data using a combination rule. These techniques should incrementally process and learn from existing data in a limited memory and time to predict incoming instances and also to cope with different types of concept drifts including incremental, gradual, abrupt or recurring. A sheer number of applications can benefit from data stream classification from non-stationary data, including weather forecasting, stock market analysis, spam filtering systems, credit card fraud detection, traffic monitoring, sensor data analysis in Internet of Things (IoT) networks, to mention a few. Since each application has its own characteristics and conditions, it is difficult to introduce a single approach that would be suitable for all problem domains. This chapter studies ensembles’ dynamic behaviour of existing ensemble methods (e.g. addition, removal and update of classifiers) in non-stationary data stream classification. It proposes a new, compact, yet informative formalisation of state-of-the-art methods. The chapter also presents results of our experiments comparing a diverse selection of best performing algorithms when applied to several benchmark data sets with different types of concept drifts from different problem domains
    • …
    corecore